Analyzing the Load Balance of Term-based Partitioning
نویسندگان
چکیده
In parallel (IR) systems, where a large-scale collection is indexed and searched, the query response time is limited by the time of the slowest node in the system. Thus distributing the load equally across the nodes is very important issue. Mainly there are two methods for collection indexing, namely document-based and term-based indexing. In term-based partitioning, the terms of the global index of a large-scale data collection are distributed or partitioned equally among nodes, and then a given query is divided into sub-queries and each sub-query is then directed to the relevant node. This provides high query throughput and concurrency but poor parallelism and load balance. In this paper, we introduce new methods for terms partitioning and then we compare the results from our methods with the results from the previous work with respect to load balance and query response time. KeywordsTerm-partitioning schemes, Term-frequency partitioning, Term-lengthpartitioning, Node utilization, Load balance
منابع مشابه
Improving Load Balance and Query Throughput of Distributed IR Systems
As the number of queries grows over time it becomes necessary that Information Retrieval (IR) system provides high query processing rate i.e. high query throughput. In IR systems, there are three types of data partitioning, namely term-based, document-based, and hybrid partitioning. In document-based and hybrid partitioning, query is sent to all nodes and thus high level of parallelism is achie...
متن کاملA Two-Tier Distributed Full-Text Indexing System
The performance of indexing systems is very important for a search engine. Usually, indexing systems on large-scale clusters can provide high search efficiency, but it brings expensive hardware costs. The costs would be greatly reduced if a distributed indexing system runs on small-scale clusters connected by the Internet. Two current inverted file partitioning schemes: document partitioning an...
متن کاملA Multiway Design-driven Partitioning Algorithm for Distributed Verilog Simulation
Many partitioning algorithms have been proposed for distributed Very-large-scale integration (VLSI) simulation. Typically, they make use of a gate level netlist and attempt to achieve a minimal cutsize subject to a load balance constraint. The algorithm executes on a hypergraph which represents the netlist. We propose a design-driven iterative partitioning algorithm for Verilog based on module ...
متن کاملHighly scalable SFC-based dynamic load balancing and its application to atmospheric modeling
Load balance is one of the major challenges for efficient supercomputing, especially for applications that exhibit workload variations. Various dynamic load balancing and workload partitioning methods have been developed to handle this issue by migrating workload between nodes periodically during the runtime. However, on today’s top HPC systems – and even more so on future exascale systems – ru...
متن کاملA Load Balancing Model Based on Cloud Partitioning for the Public Cloud
Load balancing in the cloud computing environment has an important impact on the performance. Good load balancing makes cloud computing more efficient and improves user satisfaction. This article introduces a better load balance model for the public cloud based on the cloud partitioning concept with a switch mechanism to choose different strategies for different situations. The algorithm applie...
متن کامل